Add attention kernels optimized for arm's i8mm instruction. by copybara-service[bot] · Pull Request #942 · google/gemma.cpp

copybara-service · 2026-07-01T14:58:22Z

Add attention kernels optimized for arm's i8mm instruction.
They give about 8x higher throughput compared to previous i8 implementation.

They give about 8x higher throughput compared to previous i8 implementation. PiperOrigin-RevId: 942147808

copybara-service Bot force-pushed the test_938467018 branch 9 times, most recently from 8d2f809 to 27b2995 Compare July 3, 2026 15:04

Add attention kernels optimized for arm's i8mm instruction.

02548ac

They give about 8x higher throughput compared to previous i8 implementation. PiperOrigin-RevId: 942147808

copybara-service Bot force-pushed the test_938467018 branch from 27b2995 to 02548ac Compare July 3, 2026 15:12

copybara-service Bot merged commit 02548ac into dev Jul 3, 2026

copybara-service Bot deleted the test_938467018 branch July 3, 2026 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add attention kernels optimized for arm's i8mm instruction. #942

Add attention kernels optimized for arm's i8mm instruction. #942
copybara-service[bot] merged 1 commit into
devfrom
test_938467018

copybara-service Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Uh oh!

Conversation

copybara-service Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants